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B. Schumacher and M. Westmoreland have estabhshed a quantum analog of a well-known classical 
information theory result on a role of relative entropy as a measure of non-optimality in (classical) 
data compression. In this paper, we provide an alternative, simple and constructive proof of this 
result by constructing quantum compression codes (schemes) from classical data compression codes. 
Moreover, as the quantum data compression/coding task can be effectively reduced to a (quasi- 
)classical one, we show that relevant results from classical information theory and data compression 
become applicable and therefore can be extended to the quantum domain. 
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I. INTRODUCTION 



\Q ' Quantum relative entropy, defined as S{p\\a) = tr(plogp) — tr(p log cr), is always non-negative and is equal to zero 
C^l if and only if p = cr, where p and a are density operators defined on the same finite-dimensional Hilbert space. For 
a review of the properties of quantum relative entropy as well as the properties of quantum (von Neumann) entropy, 
(— I , defined as S{p) = tr(plogp), see, for instance, [lol |. 

rS i' Both quantum (von Neumann) entropy and quantum relative entropy can be viewed as quantum extensions of their 
I ' , , respective classical counter-parts and inherit many of their properties and applications. Such inheritance however is 
^ often neither automatic, nor obvious. For example, classical information theory[5, Chap. 5] provides the following 
natural interpretations of Shannon entropy and relative entropy. A compression code (scheme), for example, Huffman 
code, which is made optimal for an information source with a given probability distribution p of characters, would 
require H(p) bits per input letter to encode a sequence emitted by the source, where H{p) stands for the Shannon 
entropy of the probability distribution p and is defined by 

(N ■ 
> ■ 

OO : 

, Thus, if using a compression code-'^ (scheme) made optimal for an information source with probability distribution q, 
• also called as "q-source", one could compress such a source with H{q) bits per input character. If, to compress the 
same g-source, one uses a different compression code (scheme) optimal for a source with a different probability 
distribution p (i.e. p-source), it will require H{q) + D{q\\p) bits per input character, where D{q\\p) stands for the 
00 ^ relative entropy function for the probability distribution q and p and is defined by 



Hip)^-J2p(.x)^0SP(.x). (1) 



i5(g|b)^Eg(:z;)log^. (2) 



?H Accordingly, the quantity D(q\\p) represents an additional encoding cost which arises from encoding a q-source with 



a code optimal for a p-source. This (classical) data compression interpretation of (classical) Shannon and relative 
entropies extends [sl, to the quantum domain if one replaces classical information sources having probability distri- 
butions p and q with quantum information sources having density matrices p and cr, respectively, as well as (classical) 
Shannon and relative entropies with (quantum) von Neumann and relative entropies, respectively. 

In this paper, we provide an alternative, simple and constructive proof of this result. We build quantum com- 
pression codes (schemes) from classical compression codes. As the original source density matrix p can simply be 
replaced with p, a so-called "effective" source matrixQ with respect to our computational basis, our quantum compres- 
sion/coding task is effectively reduced to a (quasi-)classical one, with classical information sources having probability 
distributions pp and pcr, which are formed by the eigenvalues of p and cr, respectively. Thus, relevant results from 
classical information theory and data compression become applicable and therefore can be extended to the quantum 
domain. 



'Electronic address: akaltchenko@wlu.ca 

^ See Subsection III Al for the formal definitions of information sources, compression codes, etc. 
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Our paper is organized as follows. In Section |TT1 we briefly review some classical data compression definitions and 
results, including information sources and source codes (also known as "data compression codes"). In Section Hill we 
introduce quantum information sources, quantum data compression and discuss how classical data compression codes 
can be "turned" into quantum data compression codes. In Section HVl we introduce an "effective" density matrix for 
a quantum information source. In Section |Vl we state our main results: Lemma IV. II and Theorem [3] followed by a 
discussion. We conclude the paper with a brief conclusion section. 

II. CLASSICAL DATA COMPRESSION 
A. Classical Information Sources, Compression Codes, and Optimal Data Compression 

Let A = {ai, a2, . . . , a|^|} be a data sequence alphabet, where the notation |^| stands for the cardinality of A. 
Binary alphabet {0, 1} and Latin alphabet {a, b,c,d, . . . ,x, y, z} are widely-used alphabet examples. We will also use 
the notation \sequencejnaine\ to denote the sequence length. Let A* , A'^ , and A°° be respectively the set of all finite 
sequences, including the empty sequence, the set of all finite sequences of positive length, and the set of all infinite 
sequences, where all the sequences are drawn from A. For any positive integer n, A^ denotes the set of all sequences 
of length n from A. A sequence from A is sometimes called an ^-sequence. To save space, we denote an ^-sequence 
Xi, X2, ■ ■ ■ , Xn by x" for every integer n > 0. 

Definition II. 1 An alphabet A independently and identically distributed (i.i.d.) information source is an i.i.d. dis- 
creet stochastic process, which is defined by a single-letter (marginal) probability distribution of characters from A. 
We extend the single-letter probability distribution to a probability measure on the probability space constructed for the 
set A'^ of all A-sequences. We say "A-source" or "p-source" to emphasize that the source has a certain alphabet A 
or a probability distribution p. 

Thus, we assume that every .A-sequence is generated by an i.i.d. information source. 

Definition II. 2 An alphabet A source code C is a mapping from A'^ into {0, 1}^, where, according to our notation, 
A'^ is the set of all A-strings and {0, 1}+ is the set of all binary strings. 

We restrict our attention to uniquely decidable codes that is, for every n and every xi, X2, ■ ■ ■ , Xn, we have 

C"^ {C{xi,X2, ■ . .,Xn)) = Xi,X2, ■ ■ ■ ,Xn 

For every n and every xi,X2, • . . , Xn, we call a sequence C(a;i, X2, . . . , Xn) a codeword. We often call a source code a 
compression code and use the terms "source code" and "compression code" interchangeably. 

Definition II. 3 For any given source and an alphabet A source code C, we define the compression rate ofC in bits per 
input character by |C(xi,a;2, . . . ,Xn)\')_^, where the length of a compressed sequence (codeword) C{xi,X2, . ■ . ,Xn) 
is divided by n and averaged with respect to information source 's probability measure p. 

Theorem 1 (Optimal Data Compression) For an i.i.d. information source with probability measure p and any 
compression code, the best achievable compression rate is given by Shannon entropy H(p), which is defined by 

H{p)^-Y,p{x)\ogp{x) (3) 

Definition II. 4 For any probability distribution p , we call a source code optimal for an i.i.d. source with probability 
distribution p if the compression rate for this code is equal to Shannon entropy H{p). We denote such a code by Cp. 

Remark 1 Optimal compression code(s) exist for any and every source's probability distribution. 

B. Classical Data Compression and Relative Entropy 

Suppose we have a compression code Cp, that is Cp is optimal for an A-source with a probability distribution p. 
What happens to the compression rate if we use Cp to compress another A-source with a different distribution g? 
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It is not difficult to infer that the compression rate will generally increase, but by how much? The increase in the 
compression rate will be equal to relative entropy function D{q\\p), which is defined by 

^(9|b) = E9(^)l°g44' (4) 
■^-^ p[x) 



where relative entropy function D{q\ \p) is always non-negaitiveQ and equal to zero if and only if g = p. More precisely, 
if using a code Cq (i.e. optimal for a distribution q), one could encode a g-source with H{q) bits per input character. 
If, instead, one uses a code Cp (i.e. optimal for p) to encode a g-source, it will require H{q) + D{q\\p) bits per input 
character. 



III. TURNING CLASSICAL SOURCE CODES INTO QUANTUM TRANSFORMATIONS 



In this section we review how a classical one-to-one mapping (code) can be naturally extended^ to a unitary quantum 
transformation. But first, we need to define quantum information sources. 



A. Quantum Information Sources 



Informally, a quantum identically and independently distributed (i.i.d.) information source is an infinite sequence 
of identically and independently generated quantum states (say, photons, where each photon can have one of two 
fixed polarizations). Each such state "lives" in its "own" copy of the same Hilbert space and is characterized by its 
"own" copy of the same density operator acting on the Hilbert space. We denote the Hilbert space and the density 
operator by 7i and p, respectively. Then, the sequence of n states "lives" in the space 7i®" and is characterized by 
the density operator p®" acting on 7i®". For example, in case of a photon sequence, we have dimTi = 2. Thus, to 
define a quantum source, we just need to specify p and H. In this paper, we restrict our attention to quantum sources 
defined on two-dimensional Hilbert spaces. Often, just p is given and H is not explicitly specified as all Hilbert spaces 
of the same dimensionality are isomorphic. 



B. Turning Classical Source Codes into Quantum Transformations 



Now we ready to explain how a classical source code can be turned into a quantum transformation. For the rest 
of the paper, we will only be considering binary classical sources with alphabet A — {0, 1} and quantum information 
sources defined on a 2-dimensional Hilbert space. 

Let B = {|0), |1)} be an arbitrary, but fixed orthonormal basis in the quantum information source space H. We 
call B a computational basis. Then, for a sequence of n quantum states, the orthonormal basis will have 2" vectors of 
the form 

|ei)® |e2)«)|e3)®---(g)|e„), (5) 

where, for every integer 1 ^ i ^ n, jci) B ^ (lO)? To emphasize the symbolic correspondence between binary 
strings of length n and the basis vectors in we can rewrite (O as 

1616263... e„), (6) 

where, for every integer 1 ^ i ^ n, e.^ G {0, 1}. In other words, if we add the Dirac's "ket" notation at the beginning 
and the end of any binary string of length n, then we will get a basis vector in 

Thus, for any computational basis B € Ti., positive integer n, and a classical fully reversible function (mapping) 
for n-bit binary strings, we can define a unitary transformation on as follows. 

Let if : {0, !}"■ — > {0, 1}" be a classical fully reversible function for n-bit binary strings. For every ip and n, we define 
a unitary operator ^ : Hin — TL®" Ti-out by the bases vectors mapping ^\ei . . .e„) = |v'(ei . ■ .e„)), where, 
for every integer 1 ^ i < n, e {0, 1} and \ei) £ B = {|0), |1)}. We point out that Hout denotes an output space 
which is an isomorphic copy of the input space Tim = Ti*^", and Tiout may sometimes coincide with Tim- 



For more details, see 1 or 
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An elegant way of designing quantum circuits to implement C/g ^ was given by Langford in 0] , where f/g ^ was 
computed by the following two consecutive operations defined on "Kin ® 'Hout '■ 

|ei . . . e„)|er* • ■ • ) ^ \ei ■ ■ ■ e„) |er* ■ • ■ © V'(ei • ■ • e„)) (7) 

|ei . . . e„)|er • ■ • 6°"*) ^ |ei . . . e„ ^-^(er* • • ■ e°„"'))\er' ■ ■ ■ C'*), (8) 
where e^, e°"* G {0, 1}, |e™* . . . e°"*) G Hout, and the notation ® stands for bit-wise modulo-2 addition. 
In the above, Relation ([7]) maps a basis vector |ei . . . e„)| 0^^^^^) to the basis vector |ei . . . e„)|(p(ei . . . e„)). Relation ([5]) 

n 

maps a basis vector |ei . . . e„)|95(ei . . . e„)) to the basis vector |^^^^^)|(^(ei . . . en)). 

n 

Another interesting approach for computing C/g ^ was suggested in Q , where Jozsa and Presnell used a so-called 
"digitalization procedure" . 

Remark 2 We reiterate that, given a quantum source's Hilhert space H, the quantum transformation ^ entirely 
depends on- and determined by the selection of the computational basis B Ti. and a classical fully reversible function Lp 
(mapping) for n-bit binary strings. We also point out that as long as (/?(•) and ip^^{-) can be classically computed in 
linear time and space, then, ^ can be computed\^,\^,\^] in (log-)linear time and space, too. 



C. Quantum Data Compression Overview 

We wish to compress a sequence of quantum states so that at a later stage the compressed sequence can be de- 
compressed . We also want that the decompressed sequence will be the same as- or "close" to the original sequence. 
The measure of "closeness" is called quantum fidelity. There are a few different definitions of fidelity. One widely 
used is the fidelity between any pure state \'ip) and mixed state p, defined by 

Fm{i^\,p) = {i^\p\^) 

We now introduce a new notation J7g c ^O'" ^ unitary transformation defined as J7g c •J\^-c' ^^^^^ 

unitary transformation [/g acts on a product space Ti**" x 7i®" and can be viewed as a quantum extension of 
the classical compression code C. So to call c, we will use terms "unitary transformation" and "(quantum) 
compression code" interchangeably. 

Definition III.l Given a quantum i.i.d. source with density operator p, we define a compression rate of f~, by 

{^i|{/g,C kl^2^3...^„)|^, (9) 

where \z1Z2Z3 . . . Zn) G Ti.'^" is a sequence emitted by the quantum source, and the length of compressed quantum 
sequence q \z1Z2Z3 . . . Zn) is divided by n and averaged in the sense of a quantum-mechanical observable'^ . 

If we "cut" the compressed sequence C/g ^ 1^12^2 ^3 • ■ • Zn) at the length "slightly more" than (|C/g c \z1Z2Z3 ■ ■ ■ Zn)\), 
then it can be de-compressed with high fidelity. Thus, c 1^1^2-23 ■ • ■ Zn)\) is the average number of qubits needed 
to faithfully represent an uncompressed sequence of length n. See Q for a detailed treatment. 

Theorem 2 (Schumacher's Optimal Quantum Data Compression [^) For a quantum i.i.d. source with a 
density operator p and any compression scheme, the best achievable compression rate with a high expected fidelity 
is given by the von Neumann entropy, which is defined by 

S{p)^tr{p\og2p) (10) 

Definition III. 2 A compression code, for which the above rate is achieved, is called optimal. 

Lemma III.l For a density operator p, we choose the eigenbasis of p to be our computational basis B. Let Cp^ be a 
classical compression code, which is optimal for a probability distribution formed my the set of the eigenvalues of p. 
Then, a unitary transformation Ug q is an optimal quantum compression code for a quantum information source 
with the density operator p. 



Of course, the length a quantum sequence is a quantum-mechanical variable. See for a detailed discussion on the subject 
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IV. "EFFECTIVE" SOURCE DENSITY MATRIX 

We recall a quantum (binary) identically and independently distributed (i.i.d.) information source is defined by a 
density operator acting on a two-dimensional Hilbert space. 

Let B = {\bi)} be our computational basis and suppose we have a quantum i.i.d. source with density operator p 
acting on our two-dimensional computational Hilbert space. Let ^ \ be an orthonormal decompositions of p. 

For every p and Jozsa and Presnell defined Q a so-called "effective" source density matrix p with respect to 

a (computational) orthonormal basis {l^ii)} as follows 

j 

where r/j = ^ PijK and [Pij] is a doubly stochastic matrix defined by 

i 

P,, = {K\b,) (6,|A,) >0. 

We observe the following obvious fact. If the source eigenbasis coincides with the computation basis, i.e. {|Ai)} = 
{\bi)}, then effective source density matrix coincides with the actual source density matrix, i.e. p — p. 

V. MAIN RESULT 

Definition V.l With any density matrix p, we associate a probability distribution formed by the density matrix's 
eigenvalues and denote it by putting the density matrix notation as a subscript: Pp. That is Pp = {Ao,Ai}, 
where {Ao,Ai} are the eigenvalues of p. 

Lemma V.l Let p and a be density matrices defined on the same Hilbert space, with orthonormal decompositions p — 
|Aj}(A,| anda^Y^X] IXj)(Xjl- 

i j 

Let p be the effective density matrix of p with respect to the basis {Ixi)}? f^^d let {rjj} be the eigenvalues of p. 
Then, we have the following relation between quantum and classical entropies: 

S{p\ \a) + S{p) = D{pp\\p,) + H{pp), (11) 

where £'(-||-) is classical relative entropy, H{-) is classical (Shannon) entropy, Pp andpu are the probability distributions 
formed by eigenvalues sets of p and a, respectively. That is we have pp = and p^ = {Xi}- 

Proof 1 By the definition of effective density matrix, we have 

Vj — Pij J 
i 

where [Pij] is a doubly stochastic matrix, defined by 

Ptj = {MXj) bcMt)- 
By the definition of quantum relative entropy, we have 

S{p\\(j) = tr{p\ogp) - tr(plogcr). 

First, we look at the quantity tr{p log cr) .' 

tr{p\oga) = tr(plogcr^ |Aj)(Aj|) = ^ tr{p\oga\\^){\^\) = ^ (A,|plogcr|A,). 

i i i 

Substituting the identity {Xi\p — Xi{Xi\ into the above equation, we get 

tr(plog(7) =^A, (A,|loga|A,;) (12) 
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On the other hand, we have the following identity for the log function: 

logo- = ^ log(xi) IXj)(Xjl (13) 

3 

and, therefore, substituting the right-hand- side of ()13|) into {\i\\oga\\i) , we get 

{K\ loga|A,) = (A,| j ^ log(x,) |X,)(X,I |A.) = ^ log(x,) (A,|x,) (x,|A,) = ^ log(x,) P^, (14) 

Combining equations (I12p and (|14p . we obtain 

tr{p\Qga) = 51 51 log(Xj) ^ X! logfc) XI ^y^' = X! '^J log(xj). (15) 

i j j i j 

Substituting the right-hand-side of the quantum entropy identity S{p\\a) = ^A^logAi and that of psp into the 

i 

definition of quantum relative entropy, we have 

i j j j j j 

(16) 

Thus, we get 

S{p\\a) = -S{p) + Dirj,\\x,)+H{r,,), (17) 

where D{rij\\xj) is the (classical) relative entropy of the probability distributions rjj and Xj,' H{rij ) is the (classical) 
entropy of the distribution rjj . We can rewrite (|17p as (llip which completes the proof. 

In Subsection lll Ai we have introduced a notation Cp to denote a classical data compression code (mapping) optimal 
for a source probability distribution p. Then, according to our notation, the code Cp^ will be optimal for a (classical) 
i.i.d. source with probability distribution formed by the eigenvalues of a density matrix a. 

Lemma V.2 (Classical data compression and relative entropy) Let ^ zi, . . . , Zn be a sequence emitted by 
a classical i.i.d. source with a marginal probability distribution formed by the eigenvalues of p. Then, from classical 
data compression theory, we have 

^'Cp„(z")|\ ^D{p~p\\p,) + H{pf,), (18) 



n 



Pp 



where the notation in the left-hand- side stands for the length of compressed sequence (codeword) divided by n and 
averaged with respect to the probability measure Pp. 

Combining Lemma FV. 11 Lemma rV.21 and the results of Section [Till wc obtain the following theorem: 

Theorem 3 (Quantum Data Compression and Relative Entropy) Let \z1Z2Z3 . . . Zn) E 7-^®" be emitted by a 
quantum i.i.d. source with density operator p defined on a Hilbert space Ti. Let a be an arbitrary density operator 
defined on the same Hilbert space Ti.. 

We choose the eigenbasis of a to be our computational basis B, and let p he the effective matrix of p with respect to B. 
Then, the following relations hold: 

S{p\\a) + S{p) = D{pp\\p„)+H{pp) = (^\uic,^ |ziZ2;^3...^n)|^, (19) 

where the notation in the right-hand- side stands for the length of compressed quantum sequence divided by n and 
averaged in the sense of a quantum-mechanical observable. 

Remark 3 Quantum relative entropy S{p\\a) has the following operational meaning. If a quantum compression code, 
optimal for one quantum information source with a density operator a , is applied to another source with a density 
operator p, then the increase of the compression rate is equal to S{p\\a). 
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Remark 4 By fixing p and the computational basis B (i.e. the eigenbasis of a), we will also fix p. Then, for a 
fixed p, D{pp\\pcr) is minimized (and is therefore equal to zero) when a = p. Then, as it follows from Theorem\^ for 
an arbitrary computational basis B, the best possible compression rate is equal to H{pp) — S{p) and is achieved with 
transformation C^gc-- ^'^ 0/) such setup is called "mismatched bases" compression. 

If we choose the eigenbasis of p to be our computational basis B, then we have p — p, and optimal quantum 
compression is achieved with J/gCp ' with compression rate H{pp) — S{p). 

Discussion 1 As long as J/gc computed in a computational basis B, we can simply replace the original source 
density matrix p with p, the effective one with respect to the computational basis. Then, our quantum compres- 
sion/coding task is effectively reduced to a classical one, with classical information sources having probability distribu- 
tions Pp and p„ . Therefore, all relevant results from classical information theory and data compression become appli- 
cable. We have already seen one such result, the statement of data compression (non-)optimality D{pp\\pcj) -\- H{pp). 

To give another example, we consider the following task of quantum relative entropy estimation. Suppose we have 
two quantum i.i.d. sources with unknown density matrices p and a, where p and a defined on the same Hilbert space. 
We want to estimate quantum relative entropy S{p\\a). 

Let I Z1Z2Z3 . . . Zn) and \x1X2X3 . . . Xfi) be emitted by the sources with density matrices p and a, respectively. From 
Theorem \^ one could (hope) to estimate the quantity S{p\\a) + S{p) by simply measuring the average length of 
codeword ^-.^ \z1Z2Z3 . . . z„), then estimate S{p) by measuring the average length of codeword (-.^ \z1Z2Z3 . . . z„}, 
and then subtract the latter from the former. For such straightforward approach to work, one should have quantum 
compression codes Ug and Ug , which are optimal for sources with density matrices p and a , respectively. But 
as p and a are unknown, so are C^g Cp '^"■'^ Cp ■ Fortunately, in classical data compression, there exist so-called 
"universal compression codes"'jTl], which will let estimate (classical) quantities D{pp\\pa) + H{pp) and H{pp) even 
without knowing probability distributions Pp and p„. So, based on these classical codes, we can construct quantum 
codes'Ji] (unitary transformations) to estimate S{p\\(j). 

VI. CONCLUSION 

We have provided a simple and constructive proof of a quantum analog of a well-known classical information the- 
ory result on a role of relative entropy as a measure of non-optimality in data compression. We have constructed 
quantum compression codes (schemes) from classical data compression codes. Moreover, as the quantum data com- 
pression/coding task can be effectively reduced to a (quasi-)classical one, we show that relevant results from classical 
information theory and data compression become applicable and therefore can be extended to the quantum domain. 
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